Extracting Enterprise Vocabulary Using Linked Open Data

نویسندگان

  • Julian Dolby
  • Achille Fokoue
  • Aditya Kalyanpur
  • Kavitha Srinivas
  • Edith Schonberg
چکیده

A common vocabulary is vital to smooth business operation, yet codifying and maintaining an enterprise vocabulary is an arduous, manual task. We present a fully automated process for creating an enterprise vocabulary, by extracting terms from a domain-specific corpus, and extracting their types from LOD (Linked Open Data). We applied this process to create a vocabulary for the IT industry, using 58 Gartner analyst reports as a corpus, and the LOD subset consisting of DBpedia and Freebase. We present novel techniques for linking, cleansing, and extending the types in this LOD subset, resulting in an improvement of 55% for our IT domain results. We further improved our results through NER over the corpus. Our NER training is completely automated, exploiting Wikipedia and DBpedia. Altogether, we achieved 46.3% recall and 78.1% precision.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Enterprise Vocabularies Using Linked Open Data

A common vocabulary is vital to smooth business operation, yet codifying and maintaining an enterprise vocabulary is an arduous, manual task. We describe a process to automatically extract a domain specific vocabulary (terms and types) from unstructured data in the enterprise guided by term definitions in Linked Open Data (LOD). We validate our techniques by applying them to the IT (Information...

متن کامل

SKOS as a Key Element in Enterprise Linked Data Strategies

The challenges in implementing linked data technologies in enterprises are not limited to technical issues only. Projects like these deal also with organisational hurdles to be crossed, for instance the development of employee skills in the area of knowledge modelling and the implementation of a linked data strategy which foresees a cost-effective and sustainable infrastructure of high-quality ...

متن کامل

Extracting Usage Patterns of Ontologies on the Web: a Case Study on GoodRelations Vocabulary in RDFa

The number of publicly available resources that re-use terms from various OWL ontologies has increased massively over last years, with the presence of Linked Open Data datasets and the growing number of websites that embed now structured data into HTML pages using markup languages such as RDFa, microdata and microformats. In this paper, we describe an approach to exploratory analysis of ontolog...

متن کامل

Extracting knowledge from web communities and linked data for case-based reasoning systems

Web communities and the Web 2.0 provide a huge amount of experiences and there has been a growing availability of Linked (Open) Data. Making experiences and data available as knowledge to be used in case-based reasoning (CBR) systems is a current research effort. The process of extracting such knowledge from the diverse data types used in web communities, to transform data obtained from Linked ...

متن کامل

Studying the History of Pre-Modern Zoology with Linked Data and Vocabularies

In this paper we first present the international multidisciplinary research network Zoomathia, which aims the study of the transmission of zoological knowledge from Antiquity to Middle Ages through varied resources, and considers especially textual information, including compilation literature such as encyclopaedias. We then present a preliminary work in the context of Zoomathia consisting in (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008